Gemma/[Gemma_2]Using_with_Ollama

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "j4qMNV_533ls" }, "source": [ "##### Copyright 2024 Google LLC." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "urf-mQKk348O" }, "outputs": [], "source": [ "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "iS5fT77w4ZNj" }, "source": [ "# Gemma - Run with Ollama Python library\n", "\n", "Author: Sitam Meur\n", "\n", "* GitHub: [github.com/sitamgithub-MSIT](https://github.com/sitamgithub-MSIT/)\n", "* X: [@sitammeur](https://x.com/sitammeur)\n", "\n", "Description: This notebook demonstrates how you can run inference on a Gemma model using [Ollama Python library](https://github.com/ollama/ollama-python). The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.\n", "\n", "<table align=\"left\">\n", " <td>\n", " <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/[Gemma_2]Using_with_Ollama_Python.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", " </td>\n", "</table>" ] }, { "cell_type": "markdown", "metadata": { "id": "FF6vOV_74aqj" }, "source": [ "## Setup\n", "\n", "### Select the Colab runtime\n", "To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:\n", "\n", "1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.\n", "2. Select **Change runtime type**.\n", "3. Under **Hardware accelerator**, select **T4 GPU**." ] }, { "cell_type": "markdown", "metadata": { "id": "4tlnekw44gaq" }, "source": [ "## Installation" ] }, { "cell_type": "markdown", "metadata": { "id": "AL7futP_4laS" }, "source": [ "Install Ollama through the offical installation script." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "MuV7cWtcAoSV" }, "outputs": [], "source": [ "!curl -fsSL https://ollama.com/install.sh | sh" ] }, { "cell_type": "markdown", "metadata": { "id": "FpV183Rv6-1P" }, "source": [ "Install Ollama Python library through the official Python client for Ollama." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "0Mrj29SH-3OD" }, "outputs": [], "source": [ "!pip install -q ollama" ] }, { "cell_type": "markdown", "metadata": { "id": "iNxJFvGIe48_" }, "source": [ "## Start Ollama\n", "\n", "Start Ollama in background using nohup." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5CX39xKVe9UN" }, "outputs": [], "source": [ "!nohup ollama serve > ollama.log &" ] }, { "cell_type": "markdown", "metadata": { "id": "6YfDqlyo46Rp" }, "source": [ "## Prerequisites\n", "\n", "* Ollama should be installed and running. (This was already completed in previous steps.)\n", "* Pull the gemma2 model to use with the library: `ollama pull gemma2:2b`\n", " * See [Ollama.com](https://ollama.com/) for more information on the models available." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "oPU5dA1-B5Fn" }, "outputs": [], "source": [ "import ollama" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NE1AWlucBza_" }, "outputs": [], "source": [ "ollama.pull('gemma2:2b')" ] }, { "cell_type": "markdown", "metadata": { "id": "KL5Kc6HaKjmF" }, "source": [ "## Inference\n", "\n", "Run inference using Ollama Python library." ] }, { "cell_type": "markdown", "metadata": { "id": "HN0nrhpmFUUB" }, "source": [ "### Generate" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "AC5FsDQuFZfb" }, "outputs": [], "source": [ "import markdown\n", "from ollama import generate\n", "\n", "# Generate a response to a prompt\n", "response = generate(\"gemma2:2b\", \"Explain the process of photosynthesis.\")\n", "print(response[\"response\"])" ] }, { "cell_type": "markdown", "metadata": { "id": "p8v050JkFhuY" }, "source": [ "#### Streaming Responses\n", "\n", "To enable response streaming, set `stream=True`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jw8UD2v5Fms6" }, "outputs": [], "source": [ "# Stream the generated response\n", "response = generate('gemma2:2b', 'Explain the process of photosynthesis.', stream=True)\n", "\n", "for part in response:\n", " print(part['response'], end='', flush=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "utyVRlIYFvdm" }, "source": [ "#### Async client\n", "\n", "To make asynchronous requests, use the `AsyncClient` class." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JVvKQE0XF2kh" }, "outputs": [], "source": [ "import asyncio\n", "import nest_asyncio\n", "from ollama import AsyncClient\n", "\n", "nest_asyncio.apply()\n", "\n", "\n", "async def generate():\n", " \"\"\"\n", " Asynchronously generates a response to a given prompt using the AsyncClient.\n", "\n", " This function creates an instance of AsyncClient and sends a request to generate\n", " a response for the specified prompt. The response is then printed.\n", " \"\"\"\n", " # Create an instance of the AsyncClient\n", " client = AsyncClient()\n", "\n", " # Send a request to generate a response to the prompt\n", " response = await client.generate(\n", " \"gemma2:2b\", \"Explain the process of photosynthesis.\"\n", " )\n", " print(response[\"response\"])\n", "\n", "# Run the generate function\n", "asyncio.run(generate())" ] }, { "cell_type": "markdown", "metadata": { "id": "WFyi_zzWAwe7" }, "source": [ "### Chat" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "UN20MoSv_S76" }, "outputs": [], "source": [ "from ollama import chat\n", "\n", "# Start a conversation with the model\n", "messages = [\n", " {\n", " \"role\": \"user\",\n", " \"content\": \"What is keras?\",\n", " },\n", "]\n", "\n", "# Get the model's response to the message\n", "response = chat(\"gemma2:2b\", messages=messages)\n", "print(response[\"message\"][\"content\"])" ] }, { "cell_type": "markdown", "metadata": { "id": "HZGsL1FI7kj8" }, "source": [ "#### Streaming Responses\n", "\n", "To enable response streaming, set `stream=True`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "UdxAuS1C7lm_" }, "outputs": [], "source": [ "# Stream the chat response\n", "stream = chat(\n", " model=\"gemma2:2b\",\n", " messages=[{\"role\": \"user\", \"content\": \"What is keras?\"}],\n", " stream=True,\n", ")\n", "\n", "for chunk in stream:\n", " print(chunk[\"message\"][\"content\"], end=\"\", flush=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "czWceNYOEizg" }, "source": [ "#### Async client + Streaming\n", "\n", "To make asynchronous requests, use the `AsyncClient` class, and for streaming, use `stream=True`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "YJmd92z1IUVl" }, "outputs": [], "source": [ "import asyncio\n", "import nest_asyncio\n", "from ollama import AsyncClient\n", "\n", "nest_asyncio.apply()\n", "\n", "\n", "async def chat():\n", " \"\"\"\n", " Asynchronously sends a chat message to the specified model and prints the response.\n", "\n", " This function sends a message with the role \"user\" and the content \"What is keras?\"\n", " to the model \"gemma2:2b\" using the AsyncClient's chat method. The response is then streamed.\n", " \"\"\"\n", " # Define the message to send to the model\n", " message = {\"role\": \"user\", \"content\": \"What is keras?\"}\n", "\n", " # Send the message to the model and print the response\n", " async for part in await AsyncClient().chat(\n", " model=\"gemma2:2b\", messages=[message], stream=True\n", " ):\n", " print(part[\"message\"][\"content\"], end=\"\", flush=True)\n", "\n", "# Run the chat function\n", "asyncio.run(chat())" ] }, { "cell_type": "markdown", "metadata": { "id": "cDr8VzvGIXFC" }, "source": [ "## Conclusion\n", "\n", "Congratulations! You have successfully run inference on a Gemma model using the Ollama Python library. You can now integrate this into your Python projects." ] } ], "metadata": { "accelerator": "GPU", "colab": { "name": "[Gemma_2]Using_with_Ollama_Python.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }

Gemma/[Gemma_2]Using_with_Ollama_Python.ipynb (419 lines of code) (raw):